On the Conditional Independence Implication Problem: 
A Lattice-Theoretic Approach* 
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A lattice-theoretic framework is introduced 
that permits the study of the conditional in- 
dependence (CI) implication problem relative 
to the class of discrete probability measures. 
Semi-lattices are associated with CI state- 
ments and a finite, sound and complete in- 
ference system relative to semi-lattice inclu- 
sions is presented. This system is shown to 
be (1) sound and complete for saturated CI 
statements, (2) complete for general CI state- 
ments, and (3) sound and complete for stable 
CI statements. These results yield a criterion 
that can be used to falsify instances of the 
implication problem and several heuristics 
are derived that approximate this "lattice- 
exclusion" criterion in polynomial time. Fi- 
nally, we provide experimental results that 
relate our work to results obtained from other 
existing inference algorithms. 

1 Introduction 

Conditional independence is an important concept in 
many calculi for dealing with knowledge and uncer- 
tainty in artificial intelligence. The notion plays a 
fundamental role for learning and reasoning in prob- 
abilistic systems which are successfully employed in 
areas such as computer vision, computational biology, 
and robotics. Hence, new theoretical findings and al- 
gorithmic improvements have the potential to impact 
many fields of research. A central issue for reason- 
ing about conditional independence is the probabilistic 
conditional independence implication problem, that is, 
to decide whether a CI statement is entailed by a set 
of other CI statements relative to the class of discrete 
probability measures. While it remains open whether 
this problem is decidable, it is known that there ex- 
ists no finite, sound and complete inference system 
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(Studeny [8]). However, there exist finite sound in- 
ference systems that have attracted special interest. 
The most prominent is the semi-graphoid axiom sys- 
tem which was introduced as a set of sound inference 
rules relative to the class of discrete probability mea- 
sures (Pearl [6] ) . One of the main contributions of this 
paper is to extend the semi-graphoids to a finite infer- 
ence system, denoted by A, which we will show to be 

(1) sound and complete for saturated CI statements, 

(2) complete for general CI statements, and (3) sound 
and complete for stable CI statements (de Waal and 
van dcr Gaag [2]), all relative to the class of discrete 
probability measures. 

The techniques we use to obtain these results are made 
possible through the introduction of a lattice-theoretic 
framework. In this approach, semi-lattices are associ- 
ated with conditional independence statements, and A 
is shown to be sound and complete relative to certain 
inclusion relationships on these semi-lattices. To make 
the connection between this framework and the condi- 
tional independence implication problem, we first link 
the latter to an addition-based version of the problem. 
In particular, we introduce the additive implication 
problem for CI statements relative to certain classes 
of real-valued functions and specify properties of 
these classes that guarantee soundness and complete- 
ness, respectively, of A for the implication problem. 
Through the concept of multi-information functions 
induced by probability measures (Studeny [9]), we 
link the additive implication problem for this class of 
functions to the probabilistic CI implication problem. 
The combination of the lattice-inclusion techniques 
and the completeness result for conditional indepen- 
dence statements allows us to derive criteria that can 
be used to falsify instances of the implication problem. 
We show experimentally that these criteria, some of 
which can be tested for in polynomial time, work very 
effectively, and we relate the experimental results to 
those obtained from a racing algorithm introduced by 
Bouckaert and Studeny [1]. 



2 CI Statements and System A 



3.1 Semi-Lattices of CI Statements 



We define CI statements and introduce tlic finite in- 
ference system A for reasoning about the conditional 
independence implication problem. We will often write 
AB for the union AU B, ah for the set {a, 6}, and a 
for the singleton set {a} whenever the interpretation 
is clear from the context. Throughout the paper. S 
denotes a finite implicit set of statistical variables. 
Definition 2.1. The expression I{A,B\C) where A, 
B, and C are pairwise disjoint subsets of S is called a 
conditional independence ( CI) statement. If ABC = S 
we say that I{A,B\C) is saturated. If either A = $ 
and/or i? = we say that I{A, B\C) is trivial. 



Figure 1 : The inference rules of system A. 

The set of inference rules in Figure 1 will be denoted 
by A. The triviality, symmetry, decomposition, and 
contraction rules are part of the semi-graphoid axioms 
(Geiger [6]). Strong union and strong contraction are 
two additional inference rules. Note that strong union 
is not a sound inference rule relative to the class of 
discrete probability measures. The derivability of a CI 
statement c from a set of CI statements C under the 
inference rules of system A is denoted by C h c. The 
closure of C under A, denoted C^, is the set {c | C h c}. 
Lemma 2.2 (de Waal and van dcr Gaag [2]). The 
inference rule composition 

I{A, B\C) A liA, D\C) liA, BD\C) Composition 
can he derived using strong union and contraction. 

3 Lattice-Theoretic Framework 

First, we introduce the lattice-theoretic framework 
which is at the core of the theory developed in this pa- 
per. The approach wc take is made possible through 
the association of conditional independence statements 
with semi-lattices. In this section, we prove that in- 
ference system A is sound and complete relative to 
specific semi-lattice inclusions. This result forms the 
backbone of our work on the conditional independence 
implication problem. 



Given two subsets A and B of S, we will write [A, B] 
for the lattice {U \ A Q U k. U Q B}. We will now 
associate semi-lattices with conditional independence 
statements. 

Definition 3.1. Let I{A,B\C) be a CI state- 
ment. The semi-lattice of I{A,B\C) is defined by 
C{A,B\C) = [C, S]-{\A,S]VJ[B,S]). 

We will often write £(c) to denote the semi-lattice of a 
conditional independence statement c, and C{C) to de- 
note the union of semi-lattices, Uc'ec of * ^^t of 
conditional independence statements C. Using the no- 
tion of witnesses of a conditional independence state- 
ment, we can rewrite the associated semi-lattice as a 
difference-free union of lattices. 

Definition 3.2. Let I{A,B\C) be a CI statement. 
The set of all witness sets of I{A, B\C) is defined as 
W{A, B\C) = {{a, 6} I a G A and & G B}. 

Note that if /(A, B\C) is trivial, then >V(A, B\C) = 0. 
Lemma 3.3. Let c ~ I{A,B\C) be a CI statement. 
Then C{c)^\J^^^^^-^[C,W]. 

Example 3.4. Let S = {a,b,c,d} and let I{bc,d\a) 
be a CI statement. Then, C{bc, d\a) ~ [a, S] — {[be, S] U 
[d, 5*]) = {a,ab,ac}. Furthermore, W{bc,d\a) ~ 
{hd,cd} and, therefore, C{bc,d\a) — [a,ac] U [a, ah] = 
{a, ab, ac}, using Lemma 3.3. 

3.2 Soundness and Completeness of Inference 
System A for Semi-Lattice Inclusion 

We will prove that system A is sound and complete 
relative to semi-lattice inclusion. First, we show that 
if a CI statement can be derived from a set of CI state- 
ments under A, then we have a set inclusion relation- 
ship between their associated semi-lattices. 

Proposition 3.5. LetC be a set of CI statements, and 
let c be a CI statement. If C \- c, then C{C) 3 C{c). 

Proof. We prove the statement for strong contraction. 
The proofs for the other inference rules in A are anal- 
ogous and are omitted. Let U G C{D, E\C). Then 
U 2C. liU D A, then U G C{D, E\AC). UU D B, 
then U G C{E,D\BC). liU ^ A imAU ^ B, then 
U^C{A,B\G). □ 

A CI statement can be equivalent to a set of other 
CI statements with respect to the inference system A. 
The following definition of a witness decomposition of 
a CI statement is aimed to prove this property. 
Definition 3.6. The witness decomposition of the CI 
statement I{A,B\C) is defined by wdec{A,B\C) :~ 
{I{a,h\C) I a G A and 6 G B}. 



I{A,%\C) Triviality 

i\a, B\C) I{B, A\C) Symmetry 

I{A, BD\C) I{A, D\C) Decomposition 

i\a, B\CD) a I{A, D\C) Contraction 

^ I{A,BD\C) 

I{A, B\C) I{A, B\CD) Strong union 

I{A,B\C) M{D,E\AC)^ Strong 

I{D, E\BC) -> I{D, E\C) contraction 



A useful property of the witness decomposition of a 
CI statement is that its closure under A is the same 
as the closure of the CI statement itself. In addition, 
the semi-lattice of a CI statement is equal to the semi- 
lattice of its witness decomposition. 

Proposition 3.7. Let c be a CI statement. (1) 
{c}+ = wdec{c)+; and (2) C{c) = Uc'e.dec(c) ^(c')- 

Proof. To prove the first statement, let c = I{A, B\C) 
and I{a,b\C) S wdec{c). Then I{a,b\C) can be de- 
rived from /(A, B\C) by applications of the decomposi- 
tion rule. Hence, wdec{c)^ C {c}^. By Definition 3.6 
we know that for every a G A and for aWb e B one has 
/(a, 6|C) G wdec(c). By repeatedly applying composi- 
tion, we can infer the CI statement I{a,B\C). Hence, 
for all a G A, one has I{a,B\C) G wdec{c)^ and by 
symmetry I{B,a\C) G wdec(c)'^ . Again, by applying 
composition repeatedly, we can infer I{B,A\C) and by 
symmetry I{A,B\C). Hence, {c}+ C wdec{c)^ . 

To prove the second statement, let /(a, b\C) G ■wdec{c) 
and W = {a,b}. Then C{a,b\C) = [C,W]. The 
statement now follows directly from Definition 3.6 and 
Lemma 3.3. □ 

We are now in the position to prove the main result 
concerning the soundness and completeness of the in- 
ference system A for semi-lattice inclusion. 
Theorem 3.8. Let C be a set of CI statements, and 
let c be a CI statement. Then C \- c if and only if 
C{C) D C{c). 

Proof. We already know by Proposition 3.5 that if C h 
c then C{C) D C{c). We now proceed to show the other 
direction. Let us denote wdec{C) = Uc'eC wdec{c') 
and let /(a, b\C) G wdec{c) with W = {a, 6}. Prom the 
assumption C{C) 3 C{c) and Proposition 3.7(2) it fol- 
lows that C{C) D C{a,b\C) (1). By Proposition 3.7(1) 
it suffices to show that I{a,b\C) G ■wdec{C)^ . How- 
ever, we will prove the stronger statement \/V G 
[C, : I{a,b\V) G wdec{C)~^ by downward induction 
on the lattice [C, W] . 

For the base case we need to show that I{a,b\W) G 
wdec{C)'^ . By (1) W is in C{C). Hence, by Propo- 
sition 3.7(1), there exists a CI statement /(a, 6|C") G 
wdec{C) such that W €_C{a,b\C'). Now, since C" C 
W, we can derive I{a,b\W) through strong union. 

For the induction step, let C C y c W . The induc- 
tion hypothesis states that for all V' with V C V C W 
one has I{a,b\V') G wdec{C) + . By (1) V is in C{C). 
Hence, by Proposition 3.7(1), there exists a CI state- 
ment /(a', b'\C') G wdec{C) such that V G £(a', b'\C'). 
Since C" C y we can use strong union to derive 
I{a', b'\V). Let W = {a', b'}. Note that W DV = 9. 
We distinguish three cases: 



• W = W. Then we are done. 

• Exactly one of the two elements in W' is not in 
W. Without loss of generality let this element 
be b' . Then we can use contraction on the state- 
ments I{a,b'\V) and I{a,b\Vb') (the latter is in 
wdec{C)'^ by the induction hypothesis) to derive 
I{a,b'b\V), and finally decomposition to derive 
Iia,b\V). 

• Both elements in W' arc not in W. We can use 
strong contraction on the statements I{a' ,b'\V), 
I{a,b\Va'), and I{a,b\Vb') (the latter two are in 
wdec{C)~^ by the induction hypothesis) to derive 
Iia,b\V). 

This concludes the proof. □ 

Example 3.9. Let S = {a, 6, c, d}, let C = 

{/(a, 6|0), /(c, d|a), /(c, d\b)} and let c = /(c, d|0). We 
can derive c from C using the inference rule strong con- 
traction. In addition, C{C) = {0, c, d, cd} U {a, a6} U 
{6, a6} = {(l},a,b,c,d,ab,cd} and £(c) = {0,a, 6, a6}, 
and, therefore, £(C) 3 'C(c). 

4 The Additive Implication Problem 
for CI Statements 

An important result in the study of the implication 
problem relative to the class of discrete probability 
measures was gained by Studeny who linked it to 
an additive implication problem (Studeny [9]). More 
specifically, it was shown that for every CI state- 
ment I {A, B\C), a discrete probability measure P sat- 
isfies I{A,B\C) if and only if the multi-information 
function^ Mp induced by P satisfies the equality 
Mp{C) + Mp{ABC) = Mp{AC) + Mp{BC). Thus, 
the multiplication-based probabilistic CI implication 
problem was related to an addition-based implication 
problem. It is this duality that is at the basis of 
the results developed in this section. However, rather 
than immediately focusing on specific classes of multi- 
information functions, which is what we pursue in Sec- 
tion 6; we first consider the additive implication prob- 
lem for CI statements relative to arbitrary classes of 
real- valued functions. 

By a real-valued function, we will always mean a func- 
tion F -.2^ ->!{., i.e., a function that maps each subset 
of S into a real number. 

Definition 4.1. Let I{A,B\C) be a CI statement, 
and let be a real-valued function. We say that 
F a-satisfies I{A,B\C), and write \=% I[A,B\C), if 
F{C) + F[ABC) = F{AC) + F{BC). 

^The multi-information function of a probability mea- 
sure will be formally defined in Section 6. 



Relative to the notion of a- satisfaction, we can now 
define the additive implication problem for conditional 
independence statements. 

Definition 4.2 (Additive implication problem). Let 

C be a set of CI statements, let c be a CI statement, 
and let he a class of real-valued functions. We say 
that C a-implies c relative to J-, and write C [=jr c, if 
each function F £ J-' that a-satisfies the CI statements 
in C also a-satisfies the CI statement c. 

We now define the notion of density of a real-valued 
function. The density is again a real-valued function 
and plays a crucial role in reasoning about additive 
implication problems. 

Definition 4.3. Let F be a real-valued function. The 
density'^ of F is the real- valued function AF defined by 
AF(X) = Excc/cs(-l)"''"""^(f^)- for eachX C S. 

The following relationship between a real- valued func- 
tion and its density justifies the name. 

Proposition 4.4. Let F he a real-valued function. 
Then, for each X C S, F{X) = Excc/cs AF([/). 

The a- satisfaction of a real-valued function for a CI 
statement can be characterized in terms of an equation 
involving its density function. This characterization is 
central in developing our results and is a special case of 
a more general result by Sayrafi and Van Gucht who 
used it in their study of the frequent itemset mining 
problem (Sayrafi and Van Gucht [7]). 

Proposition 4.5. Let I{A,B\C) be a CI statement 
and and let F be a real-valued function. Then, \=p 
I{A, B\C) if and only if Y.ueC{A^B\c) ^^iU) = 0. 

5 Properties of Classes of Functions - 
Soundness and Completeness 

In this section we study properties of classes of real- 
valued functions that guarantee soundness and com- 
pleteness of A, respectively, for the additive implica- 
tion problem. How these results relate to probabilis- 
tic conditional independence implication will become 
clear in Section 7 and Section 8. 

5.1 Soundness 

First, we define the notion of soundness of system A 
for a given class of real- valued functions. 

Definition 5.1 (Soundness). Let he a class of real- 
valued functions. We say that A is sound relative to 
T if; for each set C of CI statements and each CI state- 
ment c, we have that Che implies C c. 

^What we call the density is sometimes referred to as 
the Mobius inversion of a real-valued function. 



In order to characterize soundness we introduce the 
following property of classes of real- valued functions. 
Definition 5.2 (Zero-density property). Let T he a, 
class of real- valued functions. We say that T has the 
zero- density property if, for each F £ for each CI 
statement c, and for each U € C{c), one has that if 
\=% c, then AF{U) = 0. 

We can now provide various characterizations of the 
soundness of inference system A for the additive im- 
plication problem for CI statements. 

Tiieorem 5.3. Let T be a class of real-valued func- 
tions. Then, the following statements are equivalent: 

(1) Strong union and decomposition are sound infer- 
ence rules relative to T for the additive implica- 
tion problem; 

(2) J- has the zero-density property; and 

(3) A is sound relative to T for the additive implica- 
tion problem. 

Proof. We first prove that statement (1) implies state- 
ment (2). Let F e T, let I{A,B\C) he a CI state- 
ment, and assume I{A,B\C). We now show that 
AF{V) = for each V e C{A,B\C). The proof goes 
by downward induction on the semi-lattice C{A, B\C). 
First, we observe that by Lemma 3.3, C{A,B\C) = 
[Jw£W{A B\c} I*-^' ^] ■ Hence, for the base case we must 
prove that AF(W) = for each W € WiA, B\C). Let 
W = {a,b} e WiA,B\C). I{a,b\C) is derivable from 
I{A,B\C) using the inference rule decomposition and 
therefore \=p I(a,b\C). Since strong union is assumed 
sound and C QW it follows that \=p I{a,b\W). Since 
C{a,b\W) = {W} we can invoke Proposition 4.5 to 
conclude that AF{W) = 0. For the induction step, 
let V S C{A,B\C). The induction hypothesis states 
that AF{U) = for aU U £ C{A, B\C) that are strict 
supersets of V. Similar to the base case, we can in- 
fer that I{A',B'\V) with A',B', and V pairwise 
disjoint, A' C A, B' C B, and C C V. Hence, by 
Proposition 4.5, Eu^dA'.B'W) ^^iU) = A-F(^) = 0. 
Since for all U <= C{A' , B'\V) with U ^V,we have by 
Proposition 3.5 that V C U e C{A, B\C) and, thus, 
AF{U) = by the induction hypothesis. 

We now prove that statement (2) implies statement 
(3). Let C be a set of CI statements, let c be a CI 
statement, and assume that Che. Since has the 
zero-density property, we have that for each F £ J^, if 
^f, C then for each U € C{C), AF{U) = 0. From Che 
and Proposition 3.5, we have C{C) D 'C(c). Hence, 
for all F G J- we have that if F a-satisfies every CI 
statements in C, then F a-satisfies c. Thus, C \=jr c. 

Finally, statement (I) follows trivially from (3). □ 



5.2 Completeness 

As with soundness in Subsection 5.1, we begin with 
the definition of the notion of completeness of inference 
system A for a given class of real- valued functions. 

Definition 5.4 (Completeness). Let T he a, class of 
real-valued functions. We say that A is complete for 
the additive implication problem for CI statements rel- 
ative to JT if, for each set C of CI statements and each 
CI statement c, one has that C c implies Che. 

We now introduce certain special real- valued functions 
that are at the basis of defining a property guarantee- 
ing completeness of system A. 

Definition 5.5. Let V C S. The Kronecker- density 
function of V, denoted Sv, is the real- valued func- 
tion such that Sv{V) = 1 and Sv{X) = Oif X ^V. 
The Kronecker-induced function of V, denoted Fy, is 
the real-valued function whose density function is the 
Kronecker density function of V, i.e., for each X C S, 
FviX) = Excc/cs ^v{U), for each X C S. 

We can now define a property on classes of real- valued 
functions that we will show to guarantee the complete- 
ness of system A for the additive implication problem. 

Definition 5.6 (Kronecker property). Let .7-" be a 
class of real-valued functions, and let C 2'^. We 
say that J- has the Kronecker property on Q if, for 
each U E ft, there exists a c^/ S R {cu ^ 0), and a 
set Djj = {dy € R \ V ^ n} such that the following 
real-valued function is in J^: 

Ffi.cu.Du ■— cuFu + ^ dvFv- 
vzs 

Note that for aU X eO., AFn,cu,Du{X) = cu if X ^ U 
and AFn,cu,Du (X) = if X C/. 

Let il'-^-' be the set of all subsets of S that lack at 
least two of their elements, i.e., il*^^) — {V C S \ 
\y\ ^ \S\ — 2}. We can now prove that the Kronecker 
property on Jl^^^ implies the completeness of system A. 

Theorem 5.7. Let J- be a class of real-valued 
functions. If J- has the Kronecker property on 
then system A is complete for the additive implication 
problem for CI statements relative to T . 

Proof. Assume that J- has the Kronecker property on 
il^^) but that A is not complete. Then there exists 
a set C of CI statements and a CI statement c such 
that C 1=^ c but C 1/ c, or, equivalently by Theo- 
rem 3.8, C{c) ^ C{C). Let U e C{c) - C{C). U must 
be an element in fi^^^ by Lemma 3.3. Since J- has 
the Kronecker property on Sl'^^', we know that there 
exists a C(7 G R {cu 7^ 0), and a set Du = {dv S 



R I F ^ f7(2)} such that -Fo(2,.c„,p„ e T. By Defini- 
tion 5.6, Ai^t^(2)_c„_£,^(X) = for aU other X e fl'-^'K 
From Proposition 4.5 it follows that [=% C, 

but c, a contradiction to C \=% c. □ 

The following example demonstrates the zero-density 
and Kronecker properties. 

Example 5.8. Let 5* = {a, 6, c}, let Ti = 
{F0, Fa, Fb, Fc} and J^2 = {Fx}, where the densities 
for each real-valued function are given by the table 
in Figure 2. The densities of the remaining subsets 
of S are assumed to be for each function. Now, 
ri(2) _ ^1 g^j^j^ therefore, Ti has the Kro- 

necker property on il'^' since ^n(2).c[/,D[/ ~ Fjj for all 
U and the zero-density property. J-'2 does not 

have the Kronecker property. It also does not have the 
zero-density property as I{b, c|0) but AFa;(0) ^ 0. 
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Figure 2: Densities of several real-valued functions. 



6 The Conditional Independence 
Implication Problem 

While the theory presented so far has been concerned 
with the additive implication problem for CI state- 
ments, it is also applicable to the conditional in- 
dependence implication problem. The link between 
these two problems is made with the concept of multi- 
information functions (Studeny [9]) induced by proba- 
bility measures. In this paper we will restrict ourselves 
to the class of discrete probability measures. 

Definition 6.1. A probability model over S = 
{si,...,s„} is a pair {dom,P), where dom is a do- 
main mapping that maps each Si to a finite do- 
main dom(si), and P is a probability measure hav- 
ing dom(si) X • • • X dom(sn) as its sample space. For 
A = {fli, . . . , flfc} C S, we will say that a is a domain 
vector of A if a G dom{ai) x • • ■ x dom{ak). 

In what follows, we will only refer to probability 
measures, keeping their probability models implicit. 

Definition 6.2. Let I{A, B\C) be a CI statement, and 
let P be a probability measure. We say that P m- 
satisfies I{A,B\C), and write hF I{AB\C), if for 
every domain vector a, b, and c of A, B, and C, re- 
spectively, P(c)P(a,b,c) = P(a, c)P(b,c). 



Relative to the notion of m- satisfaction we can now 
define the probabilistic conditional independence im- 
plication problem. 

Definition 6.3 (Probabihstic conditional indepen- 
dence implication problem). Let C be a set of CI state- 
ments, let c be a CI statement, and let V be the class 
of discrete probability measures. We say that C m- 
implies c relative to P, and write C c, if each func- 
tion P G V that m-satisfies the CI statements in C also 
m-satisfies the CI statement c. The set {c | C c} 
will be denoted by C* . 

Next, we define the multi-information function in- 
duced by a probability measure (Studeny [9]), which 
is based on the KuUback-Leibler divergence (KuUback 
and Lciblcr [4]). 

Definition 6.4. Let P and Q be two probability mea- 
sures over a discrete sample space. Then, the relative 
entropy (KuUback-Leibler divergence) H is defined as 

H{P\Q) X]{^W log^' > 0}' 

with X ranging over all elements of the discrete sample 
space. 

Definition 6.5. Let P be a probability measure, and 
let H be the relative entropy. The multi-information 
function Mp : 2^^ — *■ [0, oo] induced by P is defined as 

Mp{A) := H{P^\ W F^''>), 

aeA 

for each non-empty subset A of 5 and Mp{^) = 0."^ 

The class of multi-information functions induced by 
the class of discrete probability measures P will be de- 
noted by . We can now state the fundamental result 
of Studeny that couples the probabilistic CI implica- 
tion problem with the additive implication problem for 
CI statements relative to M. 

Theorem 6.6 (Studeny [9]). Let C be a set of CI state- 
ments and let c be a CI statement. Then, C c if 
and only if C |=p c. 

7 Saturated CI Statements - 

Soundness and Completeness of A 

In this section we show that system A is sound and 
complete for the probabilistic CI implication problem 
for saturated CI statements. We recall that a CI state- 
ment I{A,B\C') is saturated if ABC = S. We begin 
by showing the following technical lemma. 

^Here, and P^"'^ denote the marginal probability 
measures of P over A and {a}, respectively. 



Lemma 7.1. The class of multi-information functions 
Ai induced by the class of discrete probability measures 
has the zero-density property with respect to saturated 
CI statements. 



Proof. We have to show that for each saturated CI 
statements c, for each M G A^, and for each U S C{c), 
if c, then AAf(C/) = 0. The scmi-graphoid infer- 
ence rules are sound relative to the class of probability 
measures. Hence, in particular, by Theorem 6.6, weak 
union is sound relative to i.e., {I{AD,B\Cy} 
I{A,B\CD). Let M G M, let AAf be the corre- 
sponding density function, and let |=^^ I{A, B\C) with 
ABC = S. In addition, let I{A,B\C) be non-trivial 
since the proposition is obviously true for trivial CI 
statements. We will prove by downward induction on 
the semi-lattice jC{A,B\C) that AM{U) = for each 
U £ C{A, B\C). Note that this proof is similar to the 
proof of Proposition 5.3. (Here, weak union is used 
instead of decomposition and strong union). 

For the base case, we show for each W G W{A, B\C) 
that AM{W) = 0. Let W = {a, 6}. By repeatedly 
applying weak union we can derive \='^j I{a,b\W) be- 
cause ABC = S. Now, since £(a, b\W) = {W} we can 
conclude that AM{W) = 0. 

For the induction step, let V G jC{A,B\C). The in- 
duction hypothesis states that AM{U) = for each 
U G C{A,B\C) with U a strict superset of V. From 
the given CI statement I{A, B\C) we can derive, again 
by weak union, I{A',B'\V) with VA'B' = S since 
V ~C C AB. Since C{A', B'\V) contains only V and 
strict supersets V of V, with V G C{A, B\C), wc can 
conclude that J2ueC{A' ,B'\v) ^F{U) = AF{V) = by 
the induction hypothesis. □ 

We are now in the position to prove that inference 
system A is sound and complete for the probabilistic 
implication problem for saturated conditional indepen- 
dence statements. 

Theorem 7.2. A is sound and complete for the prob- 
abilistic conditional independence implication problem 
for saturated CI statements. 

Proof. The soundness follows directly from 
Lemma 7.1, Theorem 5.3, and Theorem 6.6. To 
show completeness, notice that the semi-graphoid 
axioms are derivable under inference system A. 
Furthermore, Geiger and Pearl proved that the semi- 
graphoid axioms are complete for the probabilistic 
conditional independence implication problem for 
saturated CI statements (Geiger and Pearl [3]). □ 



8 CI Statements - Completeness of A 

In this section we will show that inference system A 
is complete for the probabilistic conditional indepen- 
dence implication problem. We first prove that Ai has 
the Kronecker property on fi^^^ . To show this, it would 
be sufficient to construct a set of discrete probability 
measures whose induced multi-information functions 
arc Kronecker-induced functions. However, instead of 
taking this route, we pursue a different approach by 
first focusing on results with respect to saturated CI 
statements. We first need the following simple lemma. 
Lemma 8.1. For U S, {X e f^^^) \ X D U} ^ 

(7in(72=0 

Proposition 8.2. Let J- be a class of real-valued func- 
tions. If A is sound and complete for the additive im- 
plication problem relative to T for saturated CI state- 
ments, then T has the Kronecker property on fi*-^^ . 

Proof If |5| < 1, then fJ^^) = ^j^j ^^^g statement 
follows trivially. Hence, assume that \S\ > 2. Sup- 
pose that A is sound and complete for saturated CI 
statements but that T does not have the Kronecker 
property on fl^^h Then there exists a set U G 51^^' 
such that for each c^ G II {cu ^ 0), and for each set 
Du = {dv eR\V i 1^(2) I j^a^e ^ J'. 

Now, let C be the set of saturated CI statements 

{/(C/,I7|0)} U y { I{Ui,U2\U) } U 

UiuU2=u 
c/inc/2=0 

U ij {m,v,\uu{v})}, 

vgTj yiuy2=F-{f} 
VinV2=0 

and let c be the saturated CI statement I{Ui,U2\U) 
for some non-empty sets Ui and C/2. Notice that 
such sets exist because \U\ > 2. By Lemma 8.1 it is 
C{C) = - {U} and U £ C{c) C n'^^^ and therefore 
£(c) ^ £(C). Hence, by Theorem 3.8, C K c. We now 
show that C c to obtain the contradiction to the 
completeness of A. If there does not exist a.n F ^ T 
which a-satisfies C we arc done because then C c 
follows trivially. Thus, let F be in and assume that 
\=p C. Since A is sound relative to T for saturated 
CI statements, we know by Theorem 5.3 that !F has 
the zero-density property. Thus, AF{X) — for each 
X G n^'^'> with X ^U. But then AF{U) = since oth- 
erwise there would exist a. cu £ H, cu = AF{U) ^ 0, 
and a set Du = {dy e R \ V ^ fi^^)} such that 
Fq(2) cu Du ^ F & ^- Hence, F must be a function 
whose density is zero on every element of fi'-^-'. Thus, 
\=% c and it follows that C c. □ 

The completeness of A for the CI implication problem 
can now be proved based on the previous results. 



Theorem 8.3. A is complete for the probabilistic con- 
ditional independence implication problem. 

Proof. We know from Theorem 7.2 that A is sound 
and complete relative to for saturated CI state- 
ments. Now, by Proposition 8.2, M has the Kronecker 
property on H^^^. Finally, through Theorem 5.7 and 
Theorem 6.6, the statement follows. □ 

Example 8.4. (Studeny [9]) described the following 
sound inference rule relative to discrete probability 
measures which refuted the conjecture (Pearl [6] ) that 
the semi-graphoid axioms are complete for the proba- 
bilistic CI implication problem: 

I{A,B\CD) A I{C,D\A) A I{C,D\B) A I{A,B\(d) 
I{C, D\AB) A /(A, B\C) A I{A, B\D) A /(C, D\%). 

By applying strong contraction to the statements 
I{A, S|0), /(C, D\A), and /(C, D\B) we can derive the 
statement /(C, D|0). All the other statements can be 
derived using strong union. 

Remark 8.5. The inference system A without 
strong contraction is not complete. The consequence 
/(C, D\%) of the clause from Example 8.4 cannot be de- 
rived from the antecedents without strong contraction. 

9 Complete Axiomatization of Stable 
Independence 

When new information is available to a probabilistic 
system the set of associated relevant CI statements 
changes dynamically. However, some of the CI state- 
ments will continue to hold. These CI statements were 
termed stable by de Waal and van der Gaag [2] . A first 
investigation of their structural properties was under- 
taken by Matiis who used the term ascending condi- 
tional independence (Matus [5] ) . Every set of CI state- 
ments can be partitioned into its stable and unstable 
part. Wc will show that inference system A is sound 
and complete for the probabilistic CI implication prob- 
lem for stable conditional independence statements. 

Definition 9.1. Let C be a set of CI statements, and 
let C^'-^^ be the semi-graphoid closure of C. Then 
I{A,B\C) is said to be stable in C, if I{A,B\C') G 
C^^+ for all sets C with C C C" C S'. 

Tiieorem 9.2. LetCs be a set of stable CI statements. 
Then, A is sound and complete for the probabilistic 
conditional independence implication problem for Cs, 
or, equivalently, Cg ~ Cg . 

Proof. The soundness follows from Theorem 5.3 and 
from strong union and decomposition being sound in- 
ference rules relative to Ai for stable CI statements. 
The completeness follows from Theorem 8.3. □ 



Remark 9.3. The previous result is also interesting 
with respect to the problem of finding a minimal, non- 
redundant representation of stable independence re- 
lations. Here, lattice-inclusion could aid the lossless 
compaction of representations of stable CI statements: 
£(Cs — {c}) = C(Cs) if and only if c is redundant in Cs- 

10 Falsification Algorithm 

Theorem 3.8 and Theorem 8.3 lend themselves to a 
falsification algorithm, that is, an algorithm which can 
falsify instances of the probabilistic conditional inde- 
pendence implication problem. We consider the fol- 
lowing corollary which directly follows from these two 
results. 

Corollary 10.1. Let C be a set of CI statements, and 
let V he the class of discrete probability measures. If 
C{C) ^ C{c), then C c. 

If the falsified implications were, on average, only 
a small fraction of all those that are falsifiable, the 
result would be disappointing from a practical point 
of view. Fortunately, we will not only be able to show 
that a large number of implications can be falsified by 
the "lattice-exclusion" criterion identified in Corol- 
lary 10.1, but also that polynomial time heuristics ex- 
ist that provide good approximations of said criterion. 

Falsification Criterion. Input: A set of CI state- 
ments C and a CI statement c. Test: if C{C) ^ 'C(c), 
return "false" , else return "unknown." 

Heuristic 1. Input: A set of CI statements C 
and a CI statement I{A,B\C). Test: if for each 
I{A\B'\C') e C it is C ^ C", return "false", else 
return "unknown." 

Heuristic 2. Input: A set of CI statements C, and 
a CI statement I{A,B\C). Test: if there exists one 
W e W{A,B\C) such that for all I{A',B'\C') € C 
it is Vl^ ^ W{A',B'\C'), return "false", else return 
"unknown." 

It follows from Lemma 3.3 that if one of the two 
heuristics returns "false," then C{C) ^ 'C(c). and 
therefore C c by Corollary 10.1. 

Example 10.2. Let S" be a finite set. and 
A, B, C, and D be pairwise disjoint sub- 
sets of 5*. The inference rule intersection, 
I{A,B\DC) M{A,D\BC) I{A,BD\C), is not 
sound relative to the class of discrete probability mea- 
sures. Heuristic 1 can reject this instance of the im- 
plication problem in polynomial time in the size of S. 

Remark 10.3. The falsification criterion leads in fact 
to a family of polynomial time heuristics. While 
Heuristic 1 checks if the unique meet (greatest lower 
bound) of the semi-lattice C{c) is not in C{C) and 
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Figure 3: Rejection and acceptance curves of the rac- 
ing and falsification algorithms, respectively, for five 
attributes. 




number of antecedents 



Figure 4: Falsifications based on the lattice-exclusion 
criterion and the heuristics, for five attributes. The 
combination of the heuristics reaches 95% of the fal- 
sifications of the full-blown lattice exclusion criterion 
for 3 antecedents down to 77% for 10 antecedents. 

Heuristic 2 if the (potentially multiple) joins (least up- 
per bounds) of the semi-lattice £{c) are not in C{C), 
we may select additional elements in the semi-lattice 
C{c) that are located between these two extrema to 
derive more falsification heuristics. 

With our experiments we want to show that (1) the 
lattice-exclusion criterion can falsify a large fraction 
of all falsifiable implications, and (2) that the two 
provided heuristics are good approximation of the 
full-blown lattice-exclusion criterion. To make our 
outcomes comparable to existing results, we adopted 
the experimental setup for the racing algorithm from 
Bouckaert and Studeny [1] (also using 5 attributes). 
A thousand sets of antecedents each were generated 
by randomly selecting 3 up to 10 elementary CI state- 
ments, resulting in a total of 8000 sets of antecedents. 

*An elementary CI statement is of the form I{a,b\C), 
where a,b £ S and C C 5 — {a, b}. 



The falsification algorithm and the heuristics were run 
on these sets with each of the remaining elementary CI 
statements as consequence, one at a time. Since there 
are 80 elementary CI statements for 5 attributes, this 
resulted in 77000 implication problems for sets with 3 
antecedents, 76000 for sets with 4 antecedents, down 
to 70000 for sets with 10 antecedents. 

The rejection procedure of the racing algorithm is 
rooted in the theory of imsets: an instance is rejected 
if one of the supermodular functions constructed by 
the algorithm is a counter-model for this instance. It 
has exponential running time and might reject impli- 
cations that actually do hold. This is a consequence 
of the fact that is a strict subset of the class of all 
supermodular functions. (See Examples 4.1 and 6.2 in 
Studeny's monograph [9].) The falsification algorithm 
based on Corollary 10.1, on the other hand, ensures 
that if an instance of the implication problem is re- 
jected, then it is guaranteed not to be valid. 

Figure 3 shows the rejection curves of the racing algo- 
rithm (b) and the falsification algorithm (c), respec- 
tively, and the acceptance curve of the racing algo- 
rithm (d). The area between the two rejection curves 
can be interpreted as the "decision gap", i.e., the 
amount of instances of the implication problem for 
which the validity is unknown. The curve marked with 
circles (a) depicts the total number of tested instances. 
Figure 4 depicts the rejection curves for the falsifica- 
tion algorithm (a), for the combination of Heuristic 

I and Heuristics 2 (b), and for Heuristic 2 (c) and 
Heuristic 1 (d) run separately. The combination of 
the heuristics compares favorable with the full-blown 
falsification criterion. The experiments also show that 
Heuristic 2 is more effective than Heuristic 1. 

II Conclusion and Future Work 

A complete inference system for the probabilistic 
conditional independence implication problem was 
presented and related to the lattice-exclusion criterion. 
We derived polynomial time approximations that can 
be used as a preprocessing step to efhciently shrink the 
search space of possibly valid inferences. We already 
have experimental evidence that our approach scales 
to much larger instances of the implication problem 
than those reported on in this paper. This could, for 
instance, provide insights into combinatorial bounds 
for the number of (stable) CI structures. The falsifi- 
cation algorithm and the heuristics can be combined 
with algorithms that infer valid implications, like the 
one based on structural imsets which is used as part of 
the racing algorithm [1]. In addition, the lattice exclu- 
sion criterion and the heuristics can be utilized to store 
information about conditional independencies more 



efficiently, using non-redundant representations. Over- 
all, we believe that the lattice-theoretic framework for 
reasoning about conditional independence is a novel 
and powerful tool. Wc conjecture that there are inter- 
esting connections between our theory and Studeny's 
theory of imsets which we will continue to investigate. 
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